-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Refactors WorkerPool with Prestarts. #48677
Conversation
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
"RAY_restart_workers_api_max_num_workers"; | ||
} | ||
|
||
auto pop_worker_request = std::make_shared<PopWorkerRequest>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PopWorkerRequest is supposed to be a private thing inside worker_pool. Let's just make StartNewWorkers
to accept needed parameters to construct PopWorkerRequest inside worker pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StartNewWorker(PopWorkerRequest)
is also used by PopWorker(PopWorkerRequest)
internally. The Request does not have anything private so I guess we can just expose it and allow node_manager.cc to use it? It will be much easier. If we just expose another method with the scattered 10 arguments it doesn't have any added values any way.
// | ||
// Note: NONE of these methods guarantee that pop_worker_request.callback will be called | ||
// with the started worker. It may be called with any fitting workers. | ||
void StartNewWorker(const std::shared_ptr<PopWorkerRequest> &pop_worker_request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PrestartWorker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is also used internally by PopWorker
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some tests
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
src/ray/raylet/worker_pool.cc
Outdated
@@ -92,7 +94,7 @@ WorkerPool::WorkerPool(instrumented_io_context &io_service, | |||
std::string native_library_path, | |||
std::function<void()> starting_worker_timeout_callback, | |||
int ray_debugger_external, | |||
std::function<absl::Time()> get_time) | |||
const std::function<absl::Time()> get_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why consting, makes move below a copy
const std::optional<bool> is_actor_worker; | ||
const rpc::RuntimeEnvInfo runtime_env_info; | ||
const int runtime_env_hash; | ||
const std::vector<std::string> dynamic_options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid const-ing member variables for un-movability, and removal of noexcept move constructor
Signed-off-by: Ruiyang Wang <[email protected]>
…-core Signed-off-by: Ruiyang Wang <[email protected]>
Refactors WorkerPool and adds an extra PrestartWorkers API.
Changes:
PopWorker
->StartNewWorker
->StartWorkerProcess
with their differences documented.NodeManagerService.PrestartWorkers
gRPC method. Callers can ask to prestartnum_workers
workers, capped byRAY_restart_workers_api_max_num_workers
, with runtime_env and job_id.now + idle_worker_killing_time_threshold_ms
. In this PR we change to keep a worker "keep alive until" timestamp, set toidle time + idle_worker_killing_time_threshold_ms
orcreate time + keep_alive_duration
, and compare withnow
.